Search CORE

114 research outputs found

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps

Author: Linden Krister
Publication venue
Publication date: 01/11/2004
Field of study

Word sense disambiguation automatically determines the appropriate senses of a word in context. We have previously shown that self-organized document maps have properties similar to a large-scale semantic structure that is useful for word sense disambiguation. This work evaluates the impact of different linguistic features on self-organized document maps for word sense disambiguation. The features evaluated are various qualitative features, e.g. part-of-speech and syntactic labels, and quantitative features, e.g. cut-off levels for word frequency. It is shown that linguistic features help make contextual information explicit. If the training corpus is large even contextually weak features, such as base forms, will act in concert to produce sense distinctions in a statistically significant way. However, the most important features are syntactic dependency relations and base forms annotated with part of speech or syntactic labels. We achieve 62.9%±0.73% correct results on the fine grained lexical task of the English SENSEVAL-2 data. On the 96.7% of the test cases which need no back-off to the most frequent sense we achieve 65.7% correct results.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Word Sense Disambiguation with THESSOM

Author: Linden Krister
Publication venue
Publication date: 01/09/2003
Field of study

Helsingin yliopiston digitaalinen arkisto

Multilingual modelling of cross-lingual spelling variants

Author: Linden Krister
Publication venue
Publication date: 01/01/2006
Field of study

Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto

Word senses

Author: Linden Krister
Publication venue: [s.n.]
Publication date: 01/01/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Assigning an Inflectional Paradigm using the Longest Matching Affix

Author: Linden Krister
Publication venue
Publication date: 01/01/2008
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

A probabilistic model for guessing base forms of new words by analogy

Author: Linden Krister
Publication venue
Publication date: 01/01/2008
Field of study

Volume: 4919 Host publication title: Computational Linguistics and Intelligent Text Processing 9th International Conference, CICLing 2008, Haifa, Israel, February 17-23, 2008. ProceedingsPeer reviewe

Helsingin yliopiston digitaalinen arkisto

Entry Generation for New Words by Analogy for Morphological Lexicons

Author: Linden Krister
Publication venue
Publication date: 01/01/2009
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Common Infrastructure for Finite-State Based Methods and Linguistics Descriptions

Author: Koskenniemi Kimmo
Linden Krister
Yli-Jyrä Anssi Mikael
Publication venue
Publication date: 01/01/2006
Field of study

Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Representing the Translation Relation in a Bilingual Wordnet

Author: Linden Krister
Niemi Jyrki
Publication venue: European Language Resources Association (ELRA)
Publication date: 23/05/2012
Field of study

Proceeding volume: 8This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN), and constructing the FiWN database. FiWN was created by translating all the word senses of the Princeton WordNet (PWN) into Finnish and by joining the translations with the semantic and lexical relations of PWN extracted into a relational (database) format. The approach naturally resulted in a translation relation between PWN and FiWN. Unlike many other multilingual wordnets, the translation relation in FiWN is primarily not on the level of synsets, but on the level of an individual word sense, which allows more precise translation correspondences. This can easily be projected into a synset-level translation relation, used for linking with other wordnets via Core WordNet. Synset-level translations are also used as a default in the absence of word sense translations. The FiWN data in the relational database can be converted to other formats. In the PWN database format, translations are attached to source-language words, allowing the implementation of a Web search interface also working as a bilingual dictionary. Another representation encodes the translation relation as a finite-state transducer.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Finite-State Spell-Checking with Weighted Language and Error Models : Building and Evaluating Spell-Checkers with Wikipedia as Corpus

Author: Linden Krister
Pirinen Tommi
Publication venue
Publication date: 01/05/2010
Field of study

In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrate rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet.Peer reviewe

CiteSeerX

Helsingin yliopiston digitaalinen arkisto